Cell-type-specific effects of autism-associated 15q duplication syndrome in the human brain

Summary Recurrent copy-number variation represents one of the most well-established genetic drivers in neurodevelopmental disorders, including autism spectrum disorder. Duplication of 15q11–q13 (dup15q) is a well-described neurodevelopmental syndrome that increases the risk of autism more than 40-fold. However, the effects of this duplication on gene expression and chromatin accessibility in specific cell types in the human brain remain unknown. To identify the cell-type-specific transcriptional and epigenetic effects of dup15q in the human frontal cortex, we conducted single-nucleus RNA sequencing and multi-omic sequencing on dup15q-affected individuals (n = 6) as well as individuals with non-dup15q autism (n = 7) and neurotypical control individuals (n = 7). Cell-type-specific differential expression analysis identified significantly regulated genes, critical biological pathways, and differentially accessible genomic regions. Although there was overall increased gene expression across the duplicated genomic region, cellular identity represented an important factor mediating gene-expression changes. As compared to other cell types, neuronal subtypes showed greater upregulation of gene expression across a critical region within the duplication. Genes that fell within the duplicated region and had high baseline expression in control individuals showed only modest changes in dup15q, regardless of cell type. Of note, dup15q and autism had largely distinct signatures of chromatin accessibility but shared the majority of transcriptional regulatory motifs, suggesting convergent biological pathways. However, the transcriptional binding-factor motifs implicated in each condition implicated distinct biological mechanisms: neuronal JUN and FOS networks in autism vs. an inflammatory transcriptional network in dup15q microglia. This work provides a cell-type-specific analysis of how dup15q changes gene expression and chromatin accessibility in the human brain, and it finds evidence of marked cell-type-specific effects of this genetic driver. These findings have implications for guiding therapeutic development in dup15q syndrome, as well as understanding the functional effects of copy-number variants more broadly in neurodevelopmental disorders.


Supplemental Figures
The LinkPeaks functionality in Signac was applied to the integrated multi-omic data-set, to identify significant peaks associated with genes within the duplicated region, which was calculated given GC content, accessibility, and length of the peak.The default thresholding was used which included p value < .05,Z-score < .05,and distance = 500,000 base pairs.There was variability in the number of peaks associated with genes within the duplication; there was not a clear correlation between number of peaks and pTriplo (Select pTriplo metrics shown above gene name for clarity, with significant pTriplo score in red.)Correlation analysis demonstrated a significant positive relationship (R=0.46,p < .001) between differential peak accessibility and gene expression for dup15q associated genes/peaks in neuron vs. glia comparison.After exclusion of ASD samples, the intersection of dup15q DEG lists between 1) CNV and control and 2) neurons and glia (both padj <.05 using FindMarkers function) was used to then create a list of linked peaks (with the LinkPeaks function) associated with dup15q region DEGs.Significantly differentially accessible peaks for the above DEG associated peaks were identified as above.Finally, the fold change expression in dup15q region DEGs was correlated with the differential accessibility of the associated peaks.Logfc = 0 Table S3: Technical sample information.Read coverage breakdown by sample for snRNA-seq and snATAC-seq components as indicated from Cell Ranger and Arc output.

Supplemental Methods
Optical Mapping of genomic DNA from post-mortem brain: 15-20 mg of brain tissue was homogenized for 10 seconds in Bionano Genomics Homogenization Buffer (P/N 20406) using a TissueRuptor, passed through a 40µm filter, then pelleted at 2000 x g for 5 minutes.The pellet was resuspended and re-pelleted from Bionano Genomics Wash Buffer A (P/N 20407).Homogenized tissue was then digested in 50µL of Protease solution and Bionano Genomics Lysis and Binding Buffer (LBB) (P/N 20375) on a HulaMixer for 15 minutes at 10 rpm.Tissue was further digested in Proteinase K on a HulaMixer for 15 minutes at 10 rpm.Enzymes were inhibited by PMSF.
gDNA binding was performed in the presence of Bionano Genomics Salting Buffer (P/N 20404) and 100% Isopropanol to a Bionano Genomics Nanobind Disk (P/N 20402) on a HulaMixer for 30 minutes at 10 rpm.Nanobinds were washed with Bionano Genomics Wash Buffer 1 (P/N 20376) and Wash Buffer 2 (P/N 20377) (2 rounds each) using a Dynamag Tube Rack.gDNA was eluted from Nanobind using Bionano Genomics Elution Buffer (P/N 20378).gDNA was homogenized for 1 hour at 15 rpm and incubated at room temperature for at least 3 days before labeling and staining.gDNA was fluorescently labeled and stained using the Bionano Genomics Direct Label and Stain Kit (P/N 80005) following Bionano Prep Direct Label and Stain protocol (P/N 30206).
Data was collected on the Saphyr instrument to a target throughput of 1500 Gbp or 400X effective coverage of the GRCh38 reference.Data was submitted to the Rare Variant Analysis pipeline on Bionano Solve version 3.6.1.The Rare Variant Pipeline aligns molecules to the reference to detect relative differences in the sample, forms consensus maps of molecule support in these loci and in the presence of sufficient molecule support, a structural variant call is made.The molecule data was subsequently down-sampled to 400Gbp or 100X coverage of the GRCh38 reference and submitted to the De Novo Assembly pipeline in Bionano Solve version 3.6.1 during which molecules were assembled de novo to form haplotype consensus maps, which were then aligned to the GRCh38 reference for structural variant detection and calling.In both scenarios, global reference coverage is used to determine copy number aberrations in addition to the aforementioned structural variant calling.Results were reviewed in Bionano Access 1.6.1.During review, structural variant and copy number calls were filtered against the Bionano provided mask files which remove structural variant and copy number calls in complex or poorly constructed regions of the GRCh38 reference.Variant calls were filtered against the Bionano control database of healthy individuals retaining only variants present in less than 1% of the control database.Variants were also filtered according to recommended confidence score filtering; 0 for indels, 0.7 for inversions, 0.3 for intrachromosomal fusions, 0.65 for intrachromosomal translocations, and no confidence scores were calculated for duplications where a placeholder value of -1 is used.Variants remaining after filtering were further curated by careful review of the data.
Figure S1: Dup15q sample validation.(All samples have tetrasomy of the PWACR) A. Dup15q samples have been extensively validated in the literature 6,41 .B. Two samples were also validated with optical genome mapping.Note that the ideogram represents only the copy number changes and not the chromosomal structure.

Figure
Figure S2: Impact of demographics.Top panel: Sample comparison.No significant differences in age, PMI or RIN.(p> .05, 1 way ANOVA, Tukey post-hoc test.There was also no difference in the female percentage between groups (p>.05, chisquare).Middle and bottom panel: PCA plots demonstrate sex and age (split as < and > 21 years of age) are not major determinants of gene expression variability.

Figure S3 :
Figure S3: Quality control and filtering.snRNA-seq (top 3) and multi-omic (bottom 3) quality metrics of raw (left) and filtered (right) nuclei.Each panel demonstrates library size, gene number, and proportion of mitochondrial genes (from left to right) and for ATAC-seq-ncount RNA, ATAC, TSS enrichment and nucleosome signal are shown.Top is dup15q, middle is ASD, and bottom is control.Number of nuclei in top left corner, number of doublets in each group prior to filtering also indicated.

Figure S4 :
Figure S4: CYFIP1 chromatin accessibility.Increased expression and chromatin accessibility at CYFIP1 in dup15q samples shown, as well as peak to peak linkage and cell-type-specific chromatin accessibility.Top panels represent signal from all nuclei combined.

Figure S5 .Figure S6 :
Figure S6: Impact of seizure diagnosis or other sources of individual variability on dup15q gene expression changes.Top: Presence of seizure diagnosis in ASD cases does not recapitulate dup15q gene expression changes observed within the duplicated (CNV) region as visualized with dot plot.Bottom: UBE3A expression is separated out by each individual sample (sample_age) and cell type to demonstrate that although there is notable heterogeneity between samples, effects are not being driven by an individual sample or factor such as age.

Figure
Figure S7: Neurobiological substrates in dup15q.Overlap of known (A) SFARI pathogenic ASD risk genes as well as (B) FMR1 protein target genes, demonstrates significant enrichment (pvalue indicated in figure, hypergeometric test) These genes demonstrate notable enrichment in excitatory neuron subtypes.C. Using gene ontology analysis, we also found evidence for biological process terms involved in synaptic function.Presented are results from differential expression analysis in layer 2/3 neurons in each condition vs. control.

Figure S8 :
Figure S8: Microglia gene ontology analysis.Using gene ontology analysis, we found evidence suggesting dysregulation of synaptic pruning only in dup15q microglia, but not ASD (compared to control).

Figure S9 :
Figure S9: Dup15q gene-peak analysis.The LinkPeaks functionality in Signac was applied to the integrated multi-omic data-set, to identify significant peaks associated with genes within the duplicated region, which was calculated given GC content, accessibility, and length of the peak.The default thresholding was used which included p value < .05,Z-score < .05,and distance = 500,000 base pairs.There was variability in the number of peaks associated with genes within the duplication; there was not a clear correlation between number of peaks and pTriplo (Select pTriplo metrics shown above gene name for clarity, with significant pTriplo score in red.)

Figure S10 :
Figure S10: Association between differential chromatin accessibility and gene expression for peaks significantly associated with dup15q genes in the CNV vs Control comparison.Correlation analysis demonstrated a significant positive relationship (R=0.46,p < .001) between differential peak accessibility and gene expression for dup15q associated genes/peaks in neuron vs. glia comparison.After exclusion of ASD samples, the intersection of dup15q DEG lists between 1) CNV and control and 2) neurons and glia (both padj <.05 using FindMarkers function) was used to then create a list of linked peaks (with the LinkPeaks function) associated with dup15q region DEGs.Significantly differentially accessible peaks for the above DEG associated peaks were identified as above.Finally, the fold change expression in dup15q region DEGs was correlated with the differential accessibility of the associated peaks.

Table S1 : Differentially expressed genes
. Number of significant differentially expressed genes per comparison.

Table S2 : Cell-type-specific differentially expressed genes. Number
of significant differentially expressed genes in high resolution differential expression analysis, in dup15q vs control comparison (p-value adj<0.05).